Introduction
This article explains how to import and process data with the annex package when the require data is available as tabular text files (CSV).
To demonstrate this, two files are used called demo_Bedroom.txt (contains the measurement data) as well as demo_Bedroom_config.TXT (contains configuration; see article Config file).
Both files can easily be read using base R functions, namely read.table() and its interfacing functions like read.csv(), utils::read.delim() etc. (see ?read.table for more details).
Reading the data
The first step is to import both (i) the measurement data (stored on raw_df) and (ii) the configuration (stored on config):
raw_df <- read.csv("demo_Bedroom.txt")
config <- read.table("demo_Bedroom_config.TXT",
comment.char = "#", sep = "",
header = TRUE, na.strings = c("NA", "empty"))
# see ?read.table for details
# Class and dimension of the objects
c("raw_df" = is.data.frame(raw_df), "config" = is.data.frame(config))## raw_df config
## TRUE TRUE
## raw_df config
## [1,] 51890 8
## [2,] 8 5
Both objects are of class data.frame (tibble data frames to be precise) with a dimension of \(51890 \times 8\) (raw_df) and \(8 \times 5\) (config) respectively.
The first few observations (rows) of the two objects look as follows:
head(raw_df[, 1:4], n = 3) # First three columns only## X radonShortTermAvg temp humidity
## 1 2011-01-01 00:01:26 151 18.8 51
## 2 2011-01-01 00:06:25 151 18.8 51
## 3 2011-01-01 00:11:25 151 18.8 51
head(config, n = 3)## column variable study home room
## 1 X datetime <NA> <NA> <NA>
## 2 co2 CO2 DEMO_STUD Casa_Blanca Bed
## 3 humidity rH DEMO_STUD Casa_Blanca Bed
The object raw_df contains variables (columns) named “X”, “radonShortTermAvg”, “temp”, “humidity” which are the original names from the XLSX sheet, the config object contains the definition what the columns in raw_df contains and where they belong to. For more details read the article about the Config file.
Checking the config object
To check whether or not the config object is as expected by the annex package, the function annex_check_config() can be used. In case problems would be detected, an error will be thrown (see Config file). Else, the function is silent as in this example:
library("annex")
annex_check_config(config)… no errors, the config object meets the annex requirements. Note that this step is not necessary as it will be performed automatically when calling annex_prepare() but can be handy during development.
Preparing data
While raw_df contains the raw data set, the config object contains the information on how to rename the columns and where the observations belong to. prepare_annex() is a helper function to prepare the data set for further steps.
prepared_df <- annex_prepare(raw_df, config, quiet = TRUE)## [1] "datetime" "study" "home" "room" "CO2" "Light"
## [7] "Pressure" "Radon" "RH" "T" "VOC"
## Error in annex_prepare(raw_df, config, quiet = TRUE): variable `datetime` (originally column `X`) must be of class POSIXt
At this moment we get an error as the variable containing the date and time information is not a proper datetime object (object of class POSIXt) but a character. As the information comes in a proper ISO format, we simply convert the column (column X in raw_df) and call annex_prepare() again.
# see ?as.POSIXct for details and options
raw_df <- transform(raw_df, X = as.POSIXct(X, tz = "UTC"))
class(raw_df$X)## [1] "POSIXct" "POSIXt"
prepared_df <- annex_prepare(raw_df, config, quiet = TRUE)
head(prepared_df)## datetime study home room CO2 Light Pressure Radon RH
## 1 2011-01-01 00:01:26 DEMO_STUD Casa_Blanca BED 470 0 1026.5 151 51
## 2 2011-01-01 00:06:25 DEMO_STUD Casa_Blanca BED 477 0 1026.5 151 51
## 3 2011-01-01 00:11:25 DEMO_STUD Casa_Blanca BED 483 0 1026.5 151 51
## 4 2011-01-01 00:16:25 DEMO_STUD Casa_Blanca BED 477 0 1026.5 151 51
## 5 2011-01-01 00:21:25 DEMO_STUD Casa_Blanca BED 481 0 1026.4 151 51
## 6 2011-01-01 00:26:25 DEMO_STUD Casa_Blanca BED 483 0 1026.4 168 51
## T VOC
## 1 18.8 136
## 2 18.8 142
## 3 18.8 131
## 4 18.8 140
## 5 18.8 135
## 6 18.7 131
annex_prepare() performs a series of tasks:
- Checking the
configobject (callsannex_check_config()internally). If theconfigobject is valid, - the variables (columns) in
raw_dfare renamed and checked to be of the correct class, - informs the user if there are any columns in
raw_dfnot included inconfig(just a note) and additional columns defined inconfigwhich do not occur inraw_df, and returns the modified (possibly subsetted) object, - ensures that
datetimeis a proper datetime object (POSIXt).
The checks of missing/additional definitions in config are intended to inform the user about possible misspecifications and will not result in an error.
Performing the analysis
Once the data set is prepared properly (note that annex_prepare() is a convenience function, can also be done manually) the final object can be prepared.
Prepare annex object
annex() is the creator function which creates an object of class annex (S3) providing a series of methods and functions to conduct the final analysis.
The function expects a formula as input which describes how to process the data. The three parts of the formula are:
<measurements to be processed> ~ <datetime> | <grouping variables>- The first part defines which variables (measurements) should be processed
- Part two is always
~ datetime; the date and time information for the statistics - Part three the grouping, typically
study + home + room
annex_df <- annex(Radon + T + VOC ~ datetime| study + home + room, data = prepared_df)
head(annex_df)## datetime study home room season tod Radon T VOC
## 1 2011-01-01 00:01:26 DEMO_STUD Casa_Blanca BED 12-02 23-07 151 18.8 136
## 2 2011-01-01 00:06:25 DEMO_STUD Casa_Blanca BED 12-02 23-07 151 18.8 142
## 3 2011-01-01 00:11:25 DEMO_STUD Casa_Blanca BED 12-02 23-07 151 18.8 131
## 4 2011-01-01 00:16:25 DEMO_STUD Casa_Blanca BED 12-02 23-07 151 18.8 140
## 5 2011-01-01 00:21:25 DEMO_STUD Casa_Blanca BED 12-02 23-07 151 18.8 135
## 6 2011-01-01 00:26:25 DEMO_STUD Casa_Blanca BED 12-02 23-07 168 18.7 131
class(annex_df)## [1] "annex" "data.frame"
A series of S3 methods exist for annex objects which might be extended in the future.
Performing analysis
Based on the object returned by annex() the analysis can be performed by calling annex_stats(). The function aggregates the data based on the formula provided above, calculates a series of statistical properties, and returns an object of class annex_stats.
head(annex_df)## datetime study home room season tod Radon T VOC
## 1 2011-01-01 00:01:26 DEMO_STUD Casa_Blanca BED 12-02 23-07 151 18.8 136
## 2 2011-01-01 00:06:25 DEMO_STUD Casa_Blanca BED 12-02 23-07 151 18.8 142
## 3 2011-01-01 00:11:25 DEMO_STUD Casa_Blanca BED 12-02 23-07 151 18.8 131
## 4 2011-01-01 00:16:25 DEMO_STUD Casa_Blanca BED 12-02 23-07 151 18.8 140
## 5 2011-01-01 00:21:25 DEMO_STUD Casa_Blanca BED 12-02 23-07 151 18.8 135
## 6 2011-01-01 00:26:25 DEMO_STUD Casa_Blanca BED 12-02 23-07 168 18.7 131
stats <- annex_stats(annex_df, format = "long")
head(stats)## study home room season tod variable stats value
## 1 DEMO_STUD Casa_Blanca BED 12-02 all Radon Mean 190.7551
## 2 DEMO_STUD Casa_Blanca BED 12-02 all Radon Sd 60.8227
## 3 DEMO_STUD Casa_Blanca BED 12-02 all Radon N 25185.0000
## 4 DEMO_STUD Casa_Blanca BED 12-02 all Radon NAs 0.0000
## 5 DEMO_STUD Casa_Blanca BED 12-02 all Radon p00 57.0000
## 6 DEMO_STUD Casa_Blanca BED 12-02 all Radon p00.5 69.0000
By default, the argument format is set to "wide" which will return the statistics in a wide format. Does not matter for the further analysis but can be convenient when processed manually.
Writing output file
TODO(R): Only include this in the write_and_validate.html article?
The final step is to write the data into the final standardized file format. annex_write_stats() takes up a annex_stat object (returned by annex_stats(); long or wide format) and a file name/path where the data should be stored.
In addition, a user (integer; user ID provided by the project team) must be provided.
TODO(R): Currently no update method is available.
annex_write_stats(stats, file = "final_Bedroom.xlsx", user = 123, quiet = TRUE)This creates the file final_Bedroom.xlsx when successful.
